Introduction

This document provides a full overview of conversational corpora curated and analysed in the ACL2022 paper “From text to interaction: harnessing conversational corpora for humane and diversity-aware language technology”. Currently the collection includes conversational corpora for 73 spoken languages from 29 phyla. Together, the corpora represent 855 hours of interaction in 1680816 annotations. Not all of these corpora make an appearance in the analyses in the paper because of differences in size as well as in annotation and segmentation standards. Here we show all of the spoken language corpora we have considered, also as a way of offering full transparency about inclusion criteria.

Language overviews

For every language, this report includes a 3-panel overview plot of A the timing of turn-taking (for floor transfers only); B the duration in relation to transition timing of annotations (this provides a quick way to spot oddities in segmentation data); and C tokenised words ranked by frequency (with the top 10 displayed).

Plot axes are not standardized to make visible possible outliers. The figure panels are followed by some samples of conversations, randomly sampled from the larger corpus.

The remainder of the information comes in tables listing key characteristics of the corpus, including:

  • turns: number of annotations with timing information in the corpus, which in most corpora corresponds to the number of turns at talk
  • translated: the proportion of turns for which there is a translation available in English/French/German (on a scale from 0 to 1)
  • turnduration: mean duration of turns in this corpus
  • talkprop: sum of all annotation durations divided by length of source. If >1, indicates a densely annotated recording with quite some overlap. If <7, indicates less densely annotated recording and possibly untranscribed parts.
  • people: total number of distinct participants encountered in all source records for this corpus
  • hours: total number of hours (counting from the first transcription until the last by source)
  • turns_per_h: number of turns per hour in this corpus

Following this is a simple table of types of annotations encountered: at least talk, but possibly also laugh and breath (and sometimes NA). And finally there is a list of source files along with basic descriptive statistics per source.

+Akhoe Hai||om

Short name: akhoe_haikom; glottolog name: Hai//om-Akhoe; glottocode: haio1238; family/type: Khoe-Kwadi; macroarea: Africa

URL: https://hdl.handle.net/1839/ b1796725-1a49-48ee-93ea-75e5b440c7bc

0.5 hours

turns translated words mean.duration talkprop people hours turns_per_h
721 1 3253 1201 0.65 18 0.48 1502

annotation types

nature n
laugh 27
talk 674
NA 20

samples

2 sources

source turns translated words people notiming talkprop minutes hours
/akhoe_haikom1/Handcraft_3_Selection_complete 131 1 709 8 0 0.8 3.4 0.06
/akhoe_haikom1/state_hospital 590 1 2544 10 0 0.5 25.3 0.42

Akie

Short name: akie; glottolog name: Akie; glottocode: mosi1247; family/type: Nilotic; macroarea: Africa

URL: https://hdl.handle.net/1839/b17d3caf-83e6-4ee9-8d1a-f9e4f8179971

0.1 hours

turns translated words mean.duration talkprop people hours turns_per_h
231 1 2620 1910 1 2 0.13 1777

annotation types

nature n
talk 231

samples

1 sources

source turns translated words people notiming talkprop minutes hours
/akie1/2014-01-20Gitu4ConversationBahatiNkoiseyyo 231 1 2620 2 0 1 7.6 0.13

Akpes (Àbèsàbèsì)

Short name: akpes; glottolog name: Akpes; glottocode: akpe1248; family/type: Atlantic-Congo; macroarea: Africa

URL: https://www.elararchive.org/dk0555

0.3 hours

turns translated words mean.duration talkprop people hours turns_per_h
635 0.49 3965 1753 1 2 0.3 2117

annotation types

nature n
talk 635

samples

2 sources

source turns translated words people notiming talkprop minutes hours
/akpes1/ibe049-00s 187 0.00 753 2 0 1 4.6 0.08
/akpes1/ibe140-00s 448 0.98 3212 2 0 1 13.1 0.22

Ambel

Short name: ambel; glottolog name: Waigeo; glottocode: waig1244; family/type: Austronesian; macroarea: Papunesia

URL: http://hdl.handle.net/2196/00-0000-0000-000C-E849-2

0.7 hours

turns translated words mean.duration talkprop people hours turns_per_h
1509 0.84 6601 1938 1.1 16 0.7 2156

annotation types

nature n
[cough] 3
laugh 102
talk 1386
NA 18

samples

5 sources

source turns translated words people notiming talkprop minutes hours
/ambel1/AM056 146 0.97 747 5 0 1.0 5.9 0.10
/ambel1/AM057 129 0.98 570 3 0 1.2 3.8 0.06
/ambel1/AM064 674 0.73 3038 7 0 1.2 17.9 0.30
/ambel1/AM067 484 0.67 1937 4 0 1.3 11.3 0.19
/ambel1/AM107_0003 76 0.86 309 3 0 0.8 3.1 0.05

Anal Naga

Short name: anal; glottolog name: Anal; glottocode: anal1239; family/type: Sino-Tibetan; macroarea: Eurasia

URL: http://hdl.handle.net/2196/af2415d6-dc75-4330-ba5d-7b8122e50982

4.6 hours

turns translated words mean.duration talkprop people hours turns_per_h
6767 0 26826 1484 0.67 18 4.63 1462

annotation types

nature n
[nod] 1
laugh 12
talk 6010
NA 744

samples

15 sources

Showing only the first 10 sources; use allsources=T to show all

source turns translated words people notiming talkprop minutes hours
/anal1/anm_20160916_PO_Wolring_1 538 0 2593 4 0 0.7 19.4 0.32
/anal1/anm_20160917_LamphouPasna_Thotson_teashop 1389 0 5027 6 0 0.8 38.1 0.64
/anal1/anm_20160918_LCharu_Rockson_chat 226 0 902 4 0 0.5 12.5 0.21
/anal1/anm_20160918_LCharu_Rockson_dialogue_2 561 0 2007 2 0 0.9 14.8 0.25
/anal1/anm_20160924_Thotson_grandmothers_1 117 0 542 2 0 0.8 4.3 0.07
/anal1/anm_20161013_Jm_Dutang_lunch2 139 0 411 4 0 0.4 6.8 0.11
/anal1/anm_20161014_PO_Darchol_evening_conversation 170 0 601 3 0 0.5 9.9 0.16
/anal1/anm_20161014_PO_Darchol_evening_conversation2 689 0 1998 4 0 0.5 20.1 0.34
/anal1/anm_20161014_PO_Ralruwng_family_lunch1 600 0 1825 3 0 0.3 58.9 0.98
/anal1/anm_20161014_PO_Ralruwng_family_lunch3 111 0 365 3 0 0.2 17.8 0.30

Egyptian Arabic

Short name: arabic; glottolog name: Egyptian Arabic; glottocode: egyp1253; family/type: Afro-Asiatic; macroarea: Africa

URL: https://catalog.ldc.upenn.edu/LDC97S45

20.3 hours

turns translated words mean.duration talkprop people hours turns_per_h
33120 0 201207 2190 1 8 20.3 1632

annotation types

nature n
[cough] 50
breath 83
laugh 491
talk 31605
NA 891

samples

140 sources

Showing only the first 10 sources; use allsources=T to show all

source turns translated words people notiming talkprop minutes hours
/arabic1/4023 145 0 1538 2 3 1.0 10.1 0.17
/arabic1/4150 198 0 1949 3 5 1.0 10.0 0.17
/arabic1/4194 259 0 1667 3 5 1.0 10.1 0.17
/arabic1/4213 146 0 1580 2 0 1.0 10.0 0.17
/arabic1/4264 265 0 1813 3 8 1.1 10.6 0.18
/arabic1/4283 181 0 1753 3 13 0.9 10.7 0.18
/arabic1/4297 253 0 1605 2 9 1.0 11.4 0.19
/arabic1/4299 121 0 1770 2 5 1.0 10.1 0.17
/arabic1/4345 322 0 1643 3 13 1.0 10.1 0.17
/arabic1/4367 238 0 1037 2 2 1.0 7.3 0.12

Arapaho

Short name: arapaho; glottolog name: Arapaho; glottocode: arap1274; family/type: Algic; macroarea: North America

URL: http://hdl.handle.net/2196/3bba11be-a5e2-47dd-bfe5-42f2ee9e0bf4

4.1 hours

turns translated words mean.duration talkprop people hours turns_per_h
4821 0.89 55850 2251 0.72 34 4.07 1185

annotation types

nature n
[nod] 2
laugh 3
talk 4808
NA 8

samples

32 sources

Showing only the first 10 sources; use allsources=T to show all

source turns translated words people notiming talkprop minutes hours
/arapaho1/1 62 0.98 656 2 0 0.8 3.7 0.06
/arapaho1/14a 35 0.94 1062 5 2 0.3 7.5 0.12
/arapaho1/14b 12 1.00 37 3 0 0.6 0.6 0.01
/arapaho1/14c 7 0.86 18 3 0 0.7 0.2 0.00
/arapaho1/14d 13 1.00 22 4 0 0.6 0.4 0.01
/arapaho1/14e 15 1.00 78 4 0 0.8 0.4 0.01
/arapaho1/14f 33 0.97 293 4 0 1.0 1.2 0.02
/arapaho1/14g 69 1.00 651 5 0 0.8 2.5 0.04
/arapaho1/14h 9 1.00 91 3 0 0.7 0.3 0.00
/arapaho1/17b 44 0.45 200 7 0 0.6 1.7 0.03

Asimjeeg Datooga

Short name: asimjeeg_datooga; glottolog name: Isimjeega-Rootigaanga; glottocode: isim1234; family/type: Nilotic; macroarea: Africa

URL: http://hdl.handle.net/2196/1e9151d8-df0a-4ea7-bb6d-377b65b14310

0.4 hours

turns translated words mean.duration talkprop people hours turns_per_h
465 0 2843 2612 0.8 1 0.42 1107

annotation types

nature n
talk 465

samples

2 sources

source turns translated words people notiming talkprop minutes hours
/asimjeeg_datooga1/IGS0229_2017-3-1_5 221 0 1521 1 0 0.8 12.9 0.21
/asimjeeg_datooga1/IGS0229_2017-3-3_10 244 0 1322 1 0 0.8 12.4 0.21

Baa

Short name: baa; glottolog name: Baa; glottocode: kwaa1262; family/type: Atlantic-Congo; macroarea: Africa

URL: http://hdl.handle.net/2196/e050a2cd-f61d-435e-824e-93d24877bbaa

1.1 hours

turns translated words mean.duration talkprop people hours turns_per_h
1361 0.96 12553 2506 0.85 7 1.09 1249

annotation types

nature n
talk 1361

samples

2 sources

source turns translated words people notiming talkprop minutes hours
/baa1/KWB008 1003 0.99 9693 2 0 0.9 46.3 0.77
/baa1/KWB033 358 0.94 2860 5 0 0.8 19.0 0.32

Besemah

Short name: besemah; glottolog name: Musi; glottocode: musi1241; family/type: Austronesian; macroarea: Papunesia

URL: https://hdl.handle.net/1839/00-0000-0000-0022-6B59-B

2.4 hours

turns translated words mean.duration talkprop people hours turns_per_h
5106 1 20316 1962 1.12 14 2.41 2119

annotation types

nature n
talk 5078
NA 28

samples

4 sources

source turns translated words people notiming talkprop minutes hours
/besemah1/BES-20130426-HEN 982 1 3035 9 0 1.2 26.4 0.44
/besemah1/BES-20130506-HEN 2292 1 8954 3 0 1.3 59.3 0.99
/besemah1/BJM01-002-01 739 1 4297 3 0 0.9 31.0 0.52
/besemah1/BJM01-015-01 1093 1 4030 3 0 1.1 27.5 0.46

Brazilian Portuguese

Short name: brazilian_portuguese; glottolog name: Brazilian Portuguese; glottocode: braz1246; family/type: Indo-European; macroarea: South America

URL: https://fale.ufal.br/projeto/nurcdigital/

1.5 hours

turns translated words mean.duration talkprop people hours turns_per_h
3242 0 17109 1633 1 2 1.47 2205

annotation types

nature n
talk 3242

samples

1 sources

source turns translated words people notiming talkprop minutes hours
/brazilian_portuguese1/NURC_RE_D2_340 3242 0 17109 2 0 1 88.4 1.47

Catalan

Short name: catalan; glottolog name: Catalan; glottocode: stan1289; family/type: Indo-European; macroarea: Eurasia

URL: https://catalog.elra.info/en-us/repository/browse/ELRA-S0407/

6.7 hours

turns translated words mean.duration talkprop people hours turns_per_h
11059 0 93827 1992 0.91 24 6.65 1663

annotation types

nature n
[blow] 12
breath 27
laugh 103
talk 10912
NA 5

samples

42 sources

Showing only the first 10 sources; use allsources=T to show all

source turns translated words people notiming talkprop minutes hours
/catalan1/ca_f01r_m04r_fcd 351 0 3079 2 0 1.1 10.1 0.17
/catalan1/ca_f01r_m04r_tod 406 0 3851 2 0 1.1 13.1 0.22
/catalan1/ca_f01r_m04r_trd 414 0 3495 2 0 0.9 16.4 0.27
/catalan1/ca_f01r_m04r_und 434 0 3621 2 0 1.0 15.2 0.25
/catalan1/ca_f02a_m05a_fcd 392 0 3139 2 0 1.2 10.1 0.17
/catalan1/ca_f02a_m05a_tod 483 0 3681 2 0 1.1 12.6 0.21
/catalan1/ca_f02a_m05a_trd 338 0 2424 2 0 1.0 10.0 0.17
/catalan1/ca_f02a_m05a_und 481 0 2665 2 0 1.0 11.0 0.18
/catalan1/ca_f37s_f38s_fcd 439 0 3885 2 0 1.0 12.6 0.21
/catalan1/ca_f37s_f38s_tod 182 0 1883 2 0 0.9 7.4 0.12

Chitkuli Kinnauri

Short name: chitkuli; glottolog name: Chhitkul-Rakchham; glottocode: chit1279; family/type: Sino-Tibetan; macroarea: Eurasia

URL: http://hdl.handle.net/2196/cf110665-3694-4e74-a8f8-79e105d89b50

1.1 hours

turns translated words mean.duration talkprop people hours turns_per_h
1123 0.9 13462 4560 1.25 15 1.14 985

annotation types

nature n
talk 1123

samples

10 sources

source turns translated words people notiming talkprop minutes hours
/chitkuli1/DEB_cik01-RK-BSN1-2018-10-15 88 1 1056 2 0 1.0 6.6 0.11
/chitkuli1/DEB_cik03-GD-AS-2018-11-01 261 1 1936 2 0 1.1 14.0 0.23
/chitkuli1/DEB_cik04-CRN-YS1-2018-11-22 41 1 805 2 0 1.0 3.9 0.07
/chitkuli1/DEB_cik06-BS2-TS-2019-05-26 97 1 1958 2 0 1.6 7.0 0.12
/chitkuli1/DEB_cik08-BSN2-HN-2019-05-28 193 1 2544 2 0 1.5 10.8 0.18
/chitkuli1/NDB_cik01-VKN-NB1-2018-11-21 82 1 848 2 0 1.3 4.6 0.08
/chitkuli1/NDB_cik09-SD3-SD4-2019-05-27 106 1 1671 2 0 1.7 6.5 0.11
/chitkuli1/NDB_cik10-MB-RB1-2019-05-28 113 0 961 2 0 1.1 6.1 0.10
/chitkuli1/TRD_cik06-BS1-AD-2019-03-07 81 1 813 2 0 1.1 3.8 0.06
/chitkuli1/TRD_cik11-SD2-NB2-2019-04-11 61 1 870 2 0 1.1 4.8 0.08

Cora

Short name: cora; glottolog name: Santa Teresa Cora; glottocode: sant1424; family/type: Uto-Aztecan; macroarea: North America

URL: http://hdl.handle.net/2196/0829a3a6-92c4-4346-8e37-04845cdd1f7f

0.8 hours

turns translated words mean.duration talkprop people hours turns_per_h
913 0 4866 3300 1.05 4 0.78 1171

annotation types

nature n
talk 913

samples

2 sources

source turns translated words people notiming talkprop minutes hours
/cora1/cora_sjc065 684 0 2549 2 0 1.0 24.0 0.40
/cora1/cora_sjc106 229 0 2317 2 0 1.1 23.1 0.38

Croatian

Short name: croatian; glottolog name: Croatian Standard; glottocode: croa1245; family/type: Indo-European; macroarea: Eurasia

URL: https://ca.talkbank.org/access/Croatian.html

24.1 hours

turns translated words mean.duration talkprop people hours turns_per_h
59946 0 310196 1407 0.8 392 24.12 2485

annotation types

nature n
[cough] 38
[sigh] 8
[yawn] 2
laugh 734
talk 58613
NA 551

samples

161 sources

Showing only the first 10 sources; use allsources=T to show all

source turns translated words people notiming talkprop minutes hours
/croatian1/2011_56 407 0 3235 3 407 0 -Inf 0
/croatian1/2011_57 438 0 2214 2 438 0 -Inf 0
/croatian1/2011_58 304 0 1523 3 304 0 -Inf 0
/croatian1/2011_59 317 0 2182 2 317 0 -Inf 0
/croatian1/2011_60 423 0 2486 3 423 0 -Inf 0
/croatian1/2011_61 561 0 3351 2 561 0 -Inf 0
/croatian1/2011_62 582 0 2833 4 582 0 -Inf 0
/croatian1/2011_63 354 0 2310 5 354 0 -Inf 0
/croatian1/2011_64 416 0 2071 3 416 0 -Inf 0
/croatian1/2011_65 87 0 618 2 87 0 -Inf 0

Czech

Short name: czech; glottolog name: Czech; glottocode: czec1258; family/type: Indo-European; macroarea: Eurasia

URL: https://mirjamernestus.nl/Ernestus/NCCCz/index.php

28.7 hours

turns translated words mean.duration talkprop people hours turns_per_h
63826 0 354079 2267 1.4 3 28.69 2225

annotation types

nature n
breath 1408
laugh 10370
talk 50843
NA 1205

samples

19 sources

Showing only the first 10 sources; use allsources=T to show all

source turns translated words people notiming talkprop minutes hours
/czech1/10_181108 3369 0 20389 3 2 1.5 90.5 1.51
/czech1/11_181108 3144 0 17910 3 10 1.3 90.2 1.50
/czech1/12_191108 3036 0 18983 3 0 1.4 90.2 1.50
/czech1/13_201108 3545 0 17234 3 1 1.3 89.9 1.50
/czech1/15_211108 4654 0 20856 3 1 1.6 90.6 1.51
/czech1/16_211108 3712 0 19034 3 1 1.3 90.7 1.51
/czech1/18_241108 3264 0 21347 3 3 1.4 90.6 1.51
/czech1/19_241108 2770 0 17503 3 0 1.1 90.3 1.50
/czech1/20_251108 4318 0 17085 3 0 1.5 90.9 1.52
/czech1/21_261108 3792 0 20596 3 1 1.4 91.4 1.52

Danish

Short name: danish; glottolog name: Danish; glottocode: dani1285; family/type: Indo-European; macroarea: Eurasia

URL: https://samtalebank.talkbank.org/

3.3 hours

turns translated words mean.duration talkprop people hours turns_per_h
6115 0 39260 1418 0.81 22 3.34 1831

annotation types

nature n
[sniff] 5
breath 22
laugh 7
talk 5887
NA 194

samples

9 sources

source turns translated words people notiming talkprop minutes hours
/danish1/225_deller 1239 0 7966 6 18 0.6 50.0 0.83
/danish1/anne_og_beate 307 0 2811 2 21 0.9 10.1 0.17
/danish1/gamledage 689 0 3282 3 15 1.0 13.0 0.22
/danish1/kartofler_og_broccoli 824 0 5609 4 6 0.5 43.1 0.72
/danish1/madlavning 280 0 1980 3 2 0.7 11.9 0.20
/danish1/omfodbold 812 0 4132 4 54 1.1 15.2 0.25
/danish1/politiforhoer 168 0 1472 5 3 0.6 7.8 0.13
/danish1/preben_og_thomas 1015 0 7591 3 19 0.8 30.6 0.51
/danish1/samfundskrise 781 0 4417 2 33 1.1 18.5 0.31

Duoxu

Short name: duoxu; glottolog name: Ersu; glottocode: ersu1241; family/type: Sino-Tibetan; macroarea: Eurasia

URL: NA

0.4 hours

turns translated words mean.duration talkprop people hours turns_per_h
327 0.5 3128 3530 0.8 4 0.4 818

annotation types

nature n
talk 327

samples

2 sources

source turns translated words people notiming talkprop minutes hours
/duoxu1/duoxu800 157 1 1460 2 0 0.7 11.8 0.2
/duoxu1/duoxu801 170 0 1668 2 0 0.9 11.8 0.2

Dutch

Short name: dutch; glottolog name: Dutch; glottocode: dutc1256; family/type: Indo-European; macroarea: Eurasia

URL: http://hdl.handle.net/10032/tm-a2-d9

387.6 hours

turns translated words mean.duration talkprop people hours turns_per_h
826207 0 4736704 1674 1.01 1269 387.61 2132

annotation types

nature n
laugh 28742
talk 789104
NA 8361

samples

2787 sources

Showing only the first 10 sources; use allsources=T to show all

source turns translated words people notiming talkprop minutes hours
/dutch1/fn000248 318 0 1495 2 0 0.9 8.0 0.13
/dutch1/fn000249 442 0 2095 2 0 0.9 11.3 0.19
/dutch1/fn000250 556 0 2773 2 0 0.9 13.5 0.23
/dutch1/fn000251 684 0 3142 2 0 0.8 17.9 0.30
/dutch1/fn000252 429 0 2312 2 0 0.8 13.2 0.22
/dutch1/fn000253 511 0 2708 2 0 0.9 14.7 0.24
/dutch1/fn000254 752 0 4023 3 0 1.0 21.2 0.35
/dutch1/fn000259 437 0 2158 2 0 0.8 12.2 0.20
/dutch1/fn000260 650 0 3124 2 0 0.9 16.3 0.27
/dutch1/fn000261 512 0 2704 2 0 0.9 13.3 0.22

English

Short name: english; glottolog name: North American English; glottocode: nort3314; family/type: Indo-European; macroarea: North America

URL: https://ca.talkbank.org/access/CallFriend/eng-n.html

28 hours

turns translated words mean.duration talkprop people hours turns_per_h
55187 0 348394 1698 0.93 35 28 1971

annotation types

nature n
[clearsthroat] 24
[cough] 39
[groan] 3
[inhales] 24
[lipsmack] 47
[sigh] 6
[sneeze] 13
[sniff] 38
[yawn] 4
breath 1514
laugh 983
talk 51897
NA 595

samples

171 sources

Showing only the first 10 sources; use allsources=T to show all

source turns translated words people notiming talkprop minutes hours
/english2/4175 1130 0 6114 4 65 0.9 29.9 0.50
/english2/4504 266 0 1616 3 32 0.9 7.8 0.13
/english2/4708 108 0 779 2 0 0.7 3.1 0.05
/english2/4745 7 0 17 2 1 1.0 0.1 0.00
/english2/4823 119 0 1124 2 1 0.5 5.5 0.09
/english2/4874 140 0 425 2 6 0.8 2.7 0.05
/english2/4889 1423 0 7630 3 58 1.0 25.3 0.42
/english2/4984 1277 0 6657 4 114 1.0 30.0 0.50
/english2/5000 1294 0 7351 2 90 1.0 30.0 0.50
/english2/5051 72 0 516 2 13 1.0 2.8 0.05

Farsi

Short name: farsi; glottolog name: Western Farsi; glottocode: west2369; family/type: Indo-European; macroarea: Eurasia

URL: https://catalog.ldc.upenn.edu/LDC2014S01

25.3 hours

turns translated words mean.duration talkprop people hours turns_per_h
33045 0 239763 2773 1.03 8 25.3 1306

annotation types

nature n
[cough] 57
[lipsmack] 14
laugh 486
talk 32334
NA 154

samples

100 sources

Showing only the first 10 sources; use allsources=T to show all

source turns translated words people notiming talkprop minutes hours
/farsi1/fa_4046 501 0 3622 2 0 0.9 24.6 0.41
/farsi1/fa_4054 369 0 1812 2 0 0.4 26.7 0.44
/farsi1/fa_4099 311 0 1928 2 0 0.9 15.8 0.26
/farsi1/fa_4117 335 0 2854 2 0 1.1 20.5 0.34
/farsi1/fa_4130 326 0 2514 2 0 1.0 15.1 0.25
/farsi1/fa_4146 352 0 2365 2 0 0.8 17.1 0.28
/farsi1/fa_4218 384 0 2445 2 0 1.0 18.8 0.31
/farsi1/fa_4219 477 0 2992 2 0 1.0 19.7 0.33
/farsi1/fa_4221 253 0 1704 2 0 1.1 10.1 0.17
/farsi1/fa_4230 136 0 816 2 0 0.9 6.7 0.11

French

Short name: french; glottolog name: French; glottocode: stan1290; family/type: Indo-European; macroarea: Eurasia

URL: https://mirjamernestus.nl/Ernestus/NCCFr/index.php

31.4 hours

turns translated words mean.duration talkprop people hours turns_per_h
37692 0 402228 2463 0.81 40 31.41 1200

annotation types

nature n
talk 37692

samples

20 sources

Showing only the first 10 sources; use allsources=T to show all

source turns translated words people notiming talkprop minutes hours
/french2/03-12-07_1 1494 0 14140 2 0 0.6 88.9 1.48
/french2/04-12-07_1 1742 0 19399 2 0 0.7 95.5 1.59
/french2/05-12-07_1 1541 0 18107 2 0 0.7 92.9 1.55
/french2/14-11-07_1 2071 0 22531 2 0 0.8 108.5 1.81
/french2/16-11-07_1 1530 0 19715 2 0 0.9 90.9 1.52
/french2/16-11-07_2 1472 0 17269 2 0 0.7 86.8 1.45
/french2/20-11-07_1 1874 0 13923 2 0 0.6 92.1 1.54
/french2/22-11-07_1 2012 0 23219 2 0 1.0 90.9 1.51
/french2/22-11-07_2 2024 0 22210 2 0 1.0 90.4 1.51
/french2/23-11-07_1 1776 0 16585 2 0 0.7 90.0 1.50

German

Short name: german; glottolog name: German; glottocode: stan1295; family/type: Indo-European; macroarea: Eurasia

URL: https://catalog.ldc.upenn.edu/LDC97S43

18.6 hours

turns translated words mean.duration talkprop people hours turns_per_h
35104 0 216674 1957 1.04 5 18.63 1884

annotation types

nature n
[clearsthroat] 32
[cough] 23
[lipsmack] 2
[sigh] 19
[sneeze] 2
[sniff] 7
breath 493
laugh 1254
talk 32155
NA 1117

samples

120 sources

Showing only the first 10 sources; use allsources=T to show all

source turns translated words people notiming talkprop minutes hours
/german1/4002 339 0 2237 2 58 1.1 10.1 0.17
/german1/4024 267 0 2035 2 43 1.0 10.2 0.17
/german1/4028 336 0 2030 2 88 1.0 10.0 0.17
/german1/4049 157 0 1055 2 30 1.1 5.0 0.08
/german1/4073 263 0 1682 2 41 1.0 10.1 0.17
/german1/4076 353 0 1817 2 77 1.0 10.1 0.17
/german1/4111 265 0 2231 2 78 1.0 10.1 0.17
/german1/4123 268 0 1839 2 62 1.0 10.0 0.17
/german1/4124 96 0 965 2 18 1.0 5.1 0.08
/german1/4287 275 0 1820 2 82 0.9 10.1 0.17

Bininj Gun-Wok

Short name: gunwinggu; glottolog name: Bininj Kun-Wok; glottocode: gunw1252; family/type: Gunwinyguan; macroarea: Australia

URL: https://dx.doi.org/10.4225/72/56E97A3F99539

0.2 hours

turns translated words mean.duration talkprop people hours turns_per_h
275 0.92 666 1188 0.5 7 0.19 1447

annotation types

nature n
laugh 5
talk 267
NA 3

samples

1 sources

source turns translated words people notiming talkprop minutes hours
/gunwinggu1/SI1-004-transcr 275 0.92 666 7 0 0.5 11.2 0.19

Gutob

Short name: gutob; glottolog name: Bodo Gadaba; glottocode: bodo1267; family/type: Austroasiatic; macroarea: Eurasia

URL: http://hdl.handle.net/2196/f027a3a2-d38f-4428-88ec-33b46d346cb3

2.2 hours

turns translated words mean.duration talkprop people hours turns_per_h
4820 0.06 15643 1402 0.84 14 2.19 2201

annotation types

nature n
[cough] 5
laugh 55
talk 4721
NA 39

samples

5 sources

source turns translated words people notiming talkprop minutes hours
/gutob1/gutob-0444-20161205_1 189 0.07 489 3 0 0.7 5.5 0.09
/gutob1/Gutob-0444-20161223 535 0.07 2012 4 0 0.8 17.0 0.28
/gutob1/Gutob-0444-20180307_1 783 0.08 2774 5 0 0.9 23.1 0.39
/gutob1/Gutob-0444-20180307_2 1159 0.04 3842 4 0 1.0 23.7 0.40
/gutob1/Gutob-0444-20180328 2154 0.06 6526 3 0 0.8 61.8 1.03

Hausa

Short name: hausa; glottolog name: Hausa; glottocode: haus1257; family/type: Afro-Asiatic; macroarea: Africa

URL: http://www.language-archives.org/language/hau

0.8 hours

turns translated words mean.duration talkprop people hours turns_per_h
3152 0 10726 855 0.92 2 0.85 3708

annotation types

nature n
laugh 25
talk 3113
NA 14

samples

4 sources

source turns translated words people notiming talkprop minutes hours
/hausa1/HAU_BC_CONV_01_BOYS 738 0 2512 2 0 1.0 10.0 0.17
/hausa1/HAU_BC_CONV_02_BOYS 515 0 1713 2 0 1.1 6.2 0.10
/hausa1/HAU_BC_CONV_03_GIRLS 354 0 1388 2 0 0.8 7.2 0.12
/hausa1/HAU_BC_CONV_04_MEN 1545 0 5113 2 0 0.8 27.3 0.46

Heyo

Short name: heyo; glottolog name: Heyo; glottocode: heyo1240; family/type: Nuclear Torricelli; macroarea: Papunesia

URL: https://www.elararchive.org/dk0550/

0.6 hours

turns translated words mean.duration talkprop people hours turns_per_h
600 0.98 3011 2305 0.6 4 0.63 952

annotation types

nature n
talk 597
NA 3

samples

1 sources

source turns translated words people notiming talkprop minutes hours
/heyo1/heyo048_0002 600 0.98 3011 4 0 0.6 38 0.63

Hungarian

Short name: hungarian; glottolog name: Hungarian; glottocode: hung1274; family/type: Uralic; macroarea: Eurasia

URL: https://hucomtech.unideb.hu/hucomtech/

49.7 hours

turns translated words mean.duration talkprop people hours turns_per_h
115830 0 407033 1486 0.94 2 49.72 2330

annotation types

nature n
[cough] 150
breath 905
laugh 1856
talk 111399
NA 1520

samples

224 sources

Showing only the first 10 sources; use allsources=T to show all

source turns translated words people notiming talkprop minutes hours
/hungarian1/003mv19_F_a_v 311 0 1087 2 0 1.0 8.7 0.14
/hungarian1/003mv19_I_a_v 842 0 3536 2 0 1.0 22.2 0.37
/hungarian1/006mc22_F_a_v 354 0 1293 2 1 0.9 12.3 0.21
/hungarian1/006mc22_I_a_v 727 0 2717 2 0 1.1 16.7 0.28
/hungarian1/007mc24_F_a_v 301 0 1046 2 0 0.9 10.6 0.18
/hungarian1/007mc24_I_a_v 962 0 3563 2 0 1.0 23.2 0.39
/hungarian1/008mc20_F_a_v 154 0 547 2 0 0.7 5.4 0.09
/hungarian1/008mc20_I_a 253 0 1046 2 0 0.9 7.3 0.12
/hungarian1/012mc25_F_a 250 0 1004 2 0 0.9 10.0 0.17
/hungarian1/012mc25_I_a 665 0 2291 2 0 0.9 18.4 0.31

Italian

Short name: italian; glottolog name: Italian; glottocode: ital1282; family/type: Indo-European; macroarea: Eurasia

URL: https://www.sciencedirect.com/science/article/pii/S0167639321000303

5.4 hours

turns translated words mean.duration talkprop people hours turns_per_h
8854 1 59610 2407 1.11 20 5.38 1646

annotation types

nature n
[cough] 8
breath 389
laugh 375
talk 8071
NA 11

samples

10 sources

source turns translated words people notiming talkprop minutes hours
/italian1/D01_01BF47 _02BF47 1030 1 4732 2 1 0.9 33.0 0.55
/italian1/D02_03BM59 _04BM56 1019 1 7274 2 0 1.2 37.9 0.63
/italian1/D04_07BF55_08BF55 910 1 7697 2 0 1.2 37.8 0.63
/italian1/D05_09BF52_10BF52 1061 1 5943 2 0 1.0 33.7 0.56
/italian1/D06_11LF28 _12LF30 778 1 5388 2 0 1.1 29.3 0.49
/italian1/D08_15BM22 _16BF22 902 1 6382 2 0 1.1 28.5 0.47
/italian1/D11_21BM60 _22BM51 606 1 4618 2 0 1.0 32.7 0.54
/italian1/D12_23LM30 _24LF27 785 1 5465 2 0 1.1 28.2 0.47
/italian1/D13_25LF23 _26BF24 850 1 6086 2 0 1.3 30.0 0.50
/italian1/D15_29BF21 _30BM23 913 1 6025 2 0 1.2 32.2 0.54

Japanese

Short name: japanese; glottolog name: Japanese; glottocode: nucl1643; family/type: Japonic; macroarea: Eurasia

URL: https://ca.talkbank.org/access/CallFriend/jpn.html

13.4 hours

turns translated words mean.duration talkprop people hours turns_per_h
32955 0 163519 1331 0.9 37 13.42 2456

annotation types

nature n
[cough] 10
[sneeze] 3
breath 1143
laugh 78
talk 31402
NA 319

samples

32 sources

Showing only the first 10 sources; use allsources=T to show all

source turns translated words people notiming talkprop minutes hours
/japanese3/0921 802 0 3758 2 9 0.9 15.0 0.25
/japanese3/1367 700 0 3824 2 8 1.0 15.0 0.25
/japanese3/1605 1171 0 6096 2 7 1.0 29.0 0.48
/japanese3/1612 803 0 3720 3 7 0.9 18.3 0.30
/japanese3/1684 1228 0 6792 2 59 0.9 30.0 0.50
/japanese3/1722 1332 0 5206 2 194 0.9 30.0 0.50
/japanese3/1758 1319 0 5783 3 15 1.0 30.0 0.50
/japanese3/1773 612 0 3269 2 2 0.9 16.3 0.27
/japanese3/1841 1190 0 7125 3 5 0.9 30.0 0.50
/japanese3/2167 1277 0 6117 3 13 1.1 30.0 0.50

Jejueo

Short name: jejueo; glottolog name: Jejueo; glottocode: jeju1234; family/type: Koreanic; macroarea: Eurasia

URL: https://www.elararchive.org/dk0351/

3 hours

turns translated words mean.duration talkprop people hours turns_per_h
4270 0.25 18719 2398 0.96 13 3.01 1419

annotation types

nature n
laugh 6
talk 4263
NA 1

samples

8 sources

source turns translated words people notiming talkprop minutes hours
/jejueo1/jeju0022_edited 681 0.00 3641 2 0 0.9 38.3 0.64
/jejueo1/jeju0080-08 107 0.00 675 2 0 1.1 5.8 0.10
/jejueo1/jeju0105 401 0.00 1416 2 0 0.9 15.3 0.25
/jejueo1/jeju0116-01-02 313 0.00 1270 5 0 1.0 13.8 0.23
/jejueo1/jeju0116-04-07 568 0.00 2546 4 0 0.8 30.9 0.51
/jejueo1/jeju0133 1093 0.02 4220 3 0 1.1 35.1 0.59
/jejueo1/jeju0162 923 1.00 4061 3 0 0.9 34.6 0.58
/jejueo1/jeju0168-01_interlinearised_0002 184 1.00 890 2 0 1.0 6.4 0.11

Juba Creole

Short name: juba_creole; glottolog name: South Sudanese Creole Arabic; glottocode: suda1237; family/type: Afro-Asiatic; macroarea: Africa

URL: http://www.language-archives.org/language/pga

0.5 hours

turns translated words mean.duration talkprop people hours turns_per_h
1662 1 6266 865 0.85 2 0.46 3613

annotation types

nature n
talk 1662

samples

2 sources

source turns translated words people notiming talkprop minutes hours
/juba_creole1/PGA_SM_CONV_1 674 1 2420 2 0 0.8 11.3 0.19
/juba_creole1/PGA_SM_CONV_2 988 1 3846 2 0 0.9 16.1 0.27

Kakabe

Short name: kakabe; glottolog name: Kakabe; glottocode: kaka1265; family/type: Mande; macroarea: Africa

URL: http://hdl.handle.net/2196/3015b4c3-1ffc-4cc5-8309-f05f9d4ce8b2

1.6 hours

turns translated words mean.duration talkprop people hours turns_per_h
1812 0.97 15708 3145 0.98 28 1.59 1140

annotation types

nature n
talk 1808
NA 4

samples

5 sources

source turns translated words people notiming talkprop minutes hours
/kakabe1/kke-c_2013-12-07_talk-02 337 0.98 3135 8 0 1.0 15.6 0.26
/kakabe1/kke-c_2013-12-07_talk-04 192 0.97 2344 4 0 1.0 12.4 0.21
/kakabe1/kke-c_2013-12-21_labiko-1 343 1.00 3224 5 0 1.1 14.4 0.24
/kakabe1/kke-c_2013-12-21_labiko-smithy 257 1.00 1725 6 0 0.8 12.8 0.21
/kakabe1/kke-c_2013-12-22_jinkoya-talk-2 683 0.90 5280 11 0 1.0 40.4 0.67

Kelabit

Short name: kelabit; glottolog name: Kelabit; glottocode: kela1258; family/type: Austronesian; macroarea: Papunesia

URL: https://www.elararchive.org/dk0301/

0.6 hours

turns translated words mean.duration talkprop people hours turns_per_h
1080 1 5733 2073 1.06 3 0.59 1831

annotation types

nature n
laugh 3
talk 1077

samples

5 sources

source turns translated words people notiming talkprop minutes hours
/kelabit1/BAR01082014CH_03 178 1 1019 3 0 0.9 6.9 0.12
/kelabit1/BAR01082014CH_04 149 1 668 3 0 1.1 3.9 0.06
/kelabit1/BAR08092014CH_05 346 1 1861 2 0 1.1 12.1 0.20
/kelabit1/BAR08092014CH_06 205 1 1047 2 0 1.1 6.0 0.10
/kelabit1/BAR17082014CH_10 202 1 1138 2 0 1.1 6.7 0.11

Kerinci

Short name: kerinci; glottolog name: Kerinci; glottocode: keri1250; family/type: Austronesian; macroarea: Papunesia

URL: https://archive.mpi.nl/tla/islandora/object/tla%3A1839_00_0000_0000_0022_654E_D

4.4 hours

turns translated words mean.duration talkprop people hours turns_per_h
12705 0 57160 1066 0.53 31 4.37 2907

annotation types

nature n
laugh 51
talk 12591
NA 63

samples

11 sources

Showing only the first 10 sources; use allsources=T to show all

source turns translated words people notiming talkprop minutes hours
/kerinci1/KER-20070205-FAD 1065 0 4982 6 1065 0.0 -Inf 0.00
/kerinci1/KER-20070925-FAD 263 0 973 3 263 0.0 -Inf 0.00
/kerinci1/KER-20071018-FAD 368 0 1636 5 368 0.0 -Inf 0.00
/kerinci1/KER-20100207-FAD 1543 0 9466 4 1543 0.0 -Inf 0.00
/kerinci1/KER-20100210-FAD 1369 0 6240 6 6 0.8 43.9 0.73
/kerinci1/KER-20110611-FAD 1502 0 5786 6 5 0.9 30.8 0.51
/kerinci1/KER-20120129-FAD 1902 0 6899 2 1 0.7 54.5 0.91
/kerinci1/KER-20120201-FAD 1289 0 4954 2 4 0.8 29.8 0.50
/kerinci1/KER-20120206-FADb 1808 0 9254 4 2 0.9 57.1 0.95
/kerinci1/KER-20140807-FAD 449 0 1708 5 0 0.7 14.2 0.24

Khinalug

Short name: khinalug; glottolog name: Khinalug; glottocode: khin1240; family/type: Nakh-Daghestanian; macroarea: Eurasia

URL: https://hdl.handle.net/1839/ c09498f1-12dc-4a7a-b21e-99a178660ff8

0.3 hours

turns translated words mean.duration talkprop people hours turns_per_h
328 0.95 1837 3678 0.97 5 0.35 937

annotation types

nature n
talk 328

samples

3 sources

source turns translated words people notiming talkprop minutes hours
/khinalug1/Agasi02A_06_2012 127 0.93 778 2 0 0.9 7.1 0.12
/khinalug1/Kamal03V_03_2013 151 0.94 698 2 0 1.0 10.5 0.17
/khinalug1/Rahman02A_06_2012 50 0.98 361 2 0 1.0 3.3 0.06

Korean

Short name: korean; glottolog name: Korean; glottocode: kore1280; family/type: Koreanic; macroarea: Eurasia

URL: https://catalog.ldc.upenn.edu/LDC96S54

26.6 hours

turns translated words mean.duration talkprop people hours turns_per_h
42750 0 229721 2545 1.14 5 26.56 1610

annotation types

nature n
[cough] 49
[lipsmack] 70
breath 302
laugh 973
talk 40983
NA 373

samples

100 sources

Showing only the first 10 sources; use allsources=T to show all

source turns translated words people notiming talkprop minutes hours
/korean1/4012 349 0 2315 2 0 1.4 15.8 0.26
/korean1/4102 471 0 2588 2 0 1.2 15.6 0.26
/korean1/4211 418 0 2302 2 0 1.0 15.4 0.26
/korean1/4296 399 0 2159 2 0 1.0 15.1 0.25
/korean1/4314 444 0 2444 2 0 1.1 15.3 0.26
/korean1/4328 669 0 1794 3 0 0.9 16.2 0.27
/korean1/4361 282 0 1973 2 0 1.1 15.6 0.26
/korean1/4434 332 0 1658 2 0 1.1 16.9 0.28
/korean1/4478 508 0 2413 2 0 1.1 15.0 0.25
/korean1/4546 351 0 2564 2 0 1.4 15.3 0.25

Kula

Short name: kula; glottolog name: Kula (Indonesia); glottocode: kula1280; family/type: Timor-Alor-Pantar; macroarea: Papunesia

URL: https://www.elararchive.org/uncategorized/SO_0320f6f6-97d4-483f-88fa-755b4eeadc2f/?pg=1&hh_cmis_filter=imdi.writtenFileType/ELAN

2.6 hours

turns translated words mean.duration talkprop people hours turns_per_h
3885 0.53 16346 1939 0.78 8 2.65 1466

annotation types

nature n
laugh 31
talk 3742
NA 112

samples

13 sources

Showing only the first 10 sources; use allsources=T to show all

source turns translated words people notiming talkprop minutes hours
/kula1/al-tpg-201310208-01_0002 398 0.66 1780 8 0 1.0 12.8 0.21
/kula1/al-tpg-20131123-04_0002 510 0.55 2156 8 0 0.9 19.4 0.32
/kula1/nw-tpg-20120605-01_0002 244 0.65 965 4 0 0.7 11.4 0.19
/kula1/nw-tpg-20120605-02A_0002 190 0.26 658 3 0 0.8 5.5 0.09
/kula1/nw-tpg-20120605-03_0002 610 0.65 1980 6 0 0.7 26.4 0.44
/kula1/nw-tpg-20121021-01 131 0.42 427 6 0 0.9 4.0 0.07
/kula1/nw-tpg-20121114-01 304 0.47 1342 5 0 0.7 10.9 0.18
/kula1/nw-tpg-20121121-07 315 0.69 1429 7 0 0.8 14.5 0.24
/kula1/nw-tpg-20121207-01 159 0.38 947 6 0 0.6 8.6 0.14
/kula1/nw-tpg-20130103-04 135 0.49 675 4 0 0.9 6.1 0.10

Laal

Short name: laal; glottolog name: Laal; glottocode: laal1242; family/type: Laal; macroarea: Africa

URL: https://hdl.handle.net/1839/93472197-4462-489c-8cee-0d9a3587f3e5

0.4 hours

turns translated words mean.duration talkprop people hours turns_per_h
530 0.72 3390 1562 0.7 7 0.4 1325

annotation types

nature n
talk 528
NA 2

samples

2 sources

source turns translated words people notiming talkprop minutes hours
/laal1/GDM-Go_140310_F_ND2-KN2-HN1-ID1_09_Entretien-conversation 290 0.85 2201 4 0 0.5 17.9 0.3
/laal1/GDM-Go_20121108_F_hommes_Conversation 240 0.60 1189 4 0 0.9 5.9 0.1

Limassa

Short name: limassa; glottolog name: Limassa; glottocode: lima1246; family/type: Atlantic-Congo; macroarea: Africa

URL: https://www.elararchive.org/uncategorized/SO_432bf24a-0f7b-45cc-82aa-bc3dfa384956/?pg=1&hh_cmis_filter=imdi.genre/Conversation|imdi.genre/Interactive%20discourse|imdi.writtenFileType/ELAN

0.9 hours

turns translated words mean.duration talkprop people hours turns_per_h
881 0 5326 1917 0.52 2 0.89 990

annotation types

nature n
talk 881

samples

4 sources

source turns translated words people notiming talkprop minutes hours
/limassa1/BME_BW_BoSy_MbGh_C_001-1 224 0 1301 2 0 0.6 16.5 0.28
/limassa1/BME_BW_BoSy_MbGh_C_001-2 290 0 1693 2 0 0.5 18.3 0.30
/limassa1/BME_BW_BoSy_MbGh_C_001-3 98 0 413 2 0 0.3 8.6 0.14
/limassa1/BME_BW_MoEm_BoSy_C_002 269 0 1919 2 0 0.7 10.4 0.17

Mambila

Short name: mambila; glottolog name: Donga Mambila; glottocode: came1252; family/type: Atlantic-Congo; macroarea: Africa

URL: http://hdl.handle.net/2196/a3bb258a-6738-43dd-9090-a0ee1853d399

0.7 hours

turns translated words mean.duration talkprop people hours turns_per_h
1320 0 8144 1880 0.9 14 0.74 1784

annotation types

nature n
talk 1298
NA 22

samples

1 sources

source turns translated words people notiming talkprop minutes hours
/mambila1/mambila 1320 0 8144 14 5 0.9 44.5 0.74

Mandarin Chinese

Short name: mandarin; glottolog name: Mandarin Chinese; glottocode: mand1415; family/type: Sino-Tibetan; macroarea: Eurasia

URL: https://catalog.ldc.upenn.edu/LDC96S34

18.6 hours

turns translated words mean.duration talkprop people hours turns_per_h
33490 0 253229 1914 0.97 8 18.64 1797

annotation types

nature n
[cough] 45
[sigh] 17
laugh 682
talk 32512
NA 234

samples

120 sources

Showing only the first 10 sources; use allsources=T to show all

source turns translated words people notiming talkprop minutes hours
/mandarin2/0003 169 0 796 2 0 0.9 5.0 0.08
/mandarin2/0022 192 0 1148 3 0 1.0 5.0 0.08
/mandarin2/0027 129 0 1081 2 0 1.1 5.1 0.09
/mandarin2/0029 374 0 2170 2 0 0.9 10.0 0.17
/mandarin2/0030 165 0 1161 2 0 0.9 5.0 0.08
/mandarin2/0104 166 0 1081 2 0 1.0 5.0 0.08
/mandarin2/0106 157 0 1281 2 0 1.0 5.0 0.08
/mandarin2/0110 237 0 1897 2 0 0.9 10.0 0.17
/mandarin2/0111 129 0 1167 2 0 1.0 5.0 0.08
/mandarin2/0626 119 0 923 3 0 0.9 5.3 0.09

Minderico

Short name: minderico; glottolog name: Minderico; glottocode: mind1263; family/type: Indo-European; macroarea: Eurasia

URL: https://hdl.handle.net/1839/f47b19bd-ac9c-434c-b559-c6ea00485f3c

0.7 hours

turns translated words mean.duration talkprop people hours turns_per_h
490 1 5021 4406 0.93 6 0.67 731

annotation types

nature n
talk 489
NA 1

samples

3 sources

source turns translated words people notiming talkprop minutes hours
/minderico1/090408atelier2_2 102 1 1056 2 0 1.0 7.6 0.13
/minderico1/090424estamine_2 216 1 1914 3 0 0.8 18.6 0.31
/minderico1/090913amoroso_vera_2 172 1 2051 2 0 1.0 13.6 0.23

Nahuatl

Short name: nahuatl; glottolog name: Tlaxcala-Puebla-Central Nahuatl; glottocode: cent2132; family/type: Uto-Aztecan; macroarea: North America

URL: http://www.openslr.org/92

43.9 hours

turns translated words mean.duration talkprop people hours turns_per_h
46293 0 393364 4344 1.26 43 43.88 1055

annotation types

nature n
talk 46293

samples

299 sources

Showing only the first 10 sources; use allsources=T to show all

source turns translated words people notiming talkprop minutes hours
/nahuatl1/Chilc_Botan_MFC307-RMM302_okwilkowit-kwaaokwilkowit-Verbenaceae_2011-07-19-f 236 0 1685 3 0 1.1 12.8 0.21
/nahuatl1/Chilc_Botan_RMM302-EGS301_kwaakwaanakatsitsiin-Rubiaceae_2011-07-15-f 120 0 1248 3 0 1.0 8.4 0.14
/nahuatl1/Chilc_Botan_RMM302-EGS301_tsotsokapahxiwit-Rubiaceae_2011-07-15-g 164 0 1459 3 0 1.0 9.9 0.17
/nahuatl1/Chilc_Botan_RMM302-MJS324_xaalkowit-Piperaceae_2011-07-19-m 138 0 1121 3 0 1.2 7.8 0.13
/nahuatl1/Chilc_Botan_RMM302-MSO325_mowih-Acanthaceae_2011-07-27-j 91 0 1038 3 0 1.1 6.3 0.10
/nahuatl1/Chilc_Botan_RMM302-MSO325_teenkwaakwalaxoochit-Acanthaceae_2011-07-27-k 37 0 446 2 0 1.2 2.7 0.05
/nahuatl1/Chilc_Botan_RMM302-MSO325_tewitsoot-Agavaceae_2011-07-27-a 366 0 4070 3 0 1.2 24.0 0.40
/nahuatl1/Chilc_Botan_RMM302-MSO325_xokotatopoonkowit-Acanthaceae_2011-07-27-l 97 0 988 3 0 1.3 6.4 0.11
/nahuatl1/Chilc_Botan_RMM302_aakiismekat-texokomekat-Vitaceae_2011-07-14-a 201 0 1729 3 0 1.1 12.0 0.20
/nahuatl1/Chilc_Botan_RMM302_aakwitaxoochit-teenkwaakwalaxoochit-Acanthaceae_2008-09-11-a 55 0 590 3 0 1.1 4.3 0.07

Nasal

Short name: nasal; glottolog name: Nasal; glottocode: nasa1239; family/type: Austronesian; macroarea: Papunesia

URL: http://hdl.handle.net/2196/00-0000-0000-0010-798B-E

0.3 hours

turns translated words mean.duration talkprop people hours turns_per_h
907 0.97 2779 1066 0.88 3 0.32 2834

annotation types

nature n
talk 897
NA 10

samples

4 sources

source turns translated words people notiming talkprop minutes hours
/nasal1/NSY-20170711-C 332 0.97 1054 2 0 0.9 7.0 0.12
/nasal1/NSY-20170712-CA 152 0.94 394 2 0 0.7 3.6 0.06
/nasal1/NSY-20170719-C 220 0.99 706 2 0 0.8 5.2 0.09
/nasal1/NSY-20170721-C 203 0.98 625 3 0 1.1 3.3 0.05

Nganasan

Short name: nganasan; glottolog name: Nganasan; glottocode: ngan1291; family/type: Uralic; macroarea: Eurasia

URL: https://corpora.uni-hamburg.de/hzsk/de/islandora/object/spoken-corpus:nslc-0.2

0.5 hours

turns translated words mean.duration talkprop people hours turns_per_h
794 0 3196 2215 1 9 0.49 1620

annotation types

nature n
talk 786
NA 8

samples

5 sources

source turns translated words people notiming talkprop minutes hours
/nganasan1/ChND-KES_061107_Dialog_conv 95 0 335 2 1 1 3.2 0.05
/nganasan1/KES-ChND_080725_Childhood_conv 270 0 1389 2 0 1 11.9 0.20
/nganasan1/KES-PED_080718_Dialog_conv1 24 0 105 3 0 1 2.2 0.04
/nganasan1/KH-KNT_960810_Ngindjili_conv 66 0 278 2 0 1 2.8 0.05
/nganasan1/TTD-ChND_080719_Dialog_conv 339 0 1089 2 0 1 9.1 0.15

N|uu

Short name: nuu; glottolog name: Ghaap-Kalahari; glottocode: nuuu1241; family/type: Tuu; macroarea: Africa

URL: http://hdl.handle.net/2196/4558585e-56ab-4e60-8d8d-5857b2bb96a3

0.6 hours

turns translated words mean.duration talkprop people hours turns_per_h
1210 0.98 8256 1255 0.7 12 0.63 1921

annotation types

nature n
talk 1210

samples

2 sources

source turns translated words people notiming talkprop minutes hours
/nuu1/NC080903-01_A-edited 246 0.98 2116 4 2 0.7 8.8 0.15
/nuu1/NM071213-01_A-edited 964 0.99 6140 9 25 0.7 29.0 0.48

Okiek

Short name: okiek; glottolog name: Okiek; glottocode: okie1245; family/type: Nilotic; macroarea: Africa

URL: NA

0.2 hours

turns translated words mean.duration talkprop people hours turns_per_h
161 1 793 2780 0.8 4 0.16 1006

annotation types

nature n
talk 161

samples

1 sources

source turns translated words people notiming talkprop minutes hours
/okiek1/okiek_conversations001_elar 161 1 793 4 0 0.8 9.4 0.16

San Jerónimo Acazulco Otomi

Short name: otomi; glottolog name: Estado de México Otomi; glottocode: esta1236; family/type: Otomanguean; macroarea: North America

URL: http://hdl.handle.net/2196/e4af5b03-70ce-4dd3-8473-64813a515d8d

0.3 hours

turns translated words mean.duration talkprop people hours turns_per_h
718 0.99 3393 1843 0.93 7 0.35 2051

annotation types

nature n
talk 716
NA 2

samples

3 sources

source turns translated words people notiming talkprop minutes hours
/otomi1/20100712acjs-rc 28 1.00 142 2 0 0.6 1.4 0.02
/otomi1/20100712acpm-sm 141 0.98 655 3 0 1.1 4.6 0.08
/otomi1/20101010acjg-bvmil 549 1.00 2596 3 0 1.1 15.2 0.25

Pagu

Short name: pagu; glottolog name: Pagu; glottocode: pagu1249; family/type: North Halmahera; macroarea: Papunesia

URL: https://hdl.handle.net/1839/00-0000-0000-0022-6530-D

0.7 hours

turns translated words mean.duration talkprop people hours turns_per_h
831 1 3185 1884 0.65 4 0.66 1259

annotation types

nature n
talk 812
NA 19

samples

2 sources

source turns translated words people notiming talkprop minutes hours
/pagu1/PAG-20120422 402 1 1606 2 0 0.7 18.6 0.31
/pagu1/PAG-20120716 429 1 1579 2 0 0.6 21.2 0.35

Pite Saami

Short name: pite_saami; glottolog name: Pite Saami; glottocode: pite1240; family/type: Uralic; macroarea: Eurasia

URL: http://saami.uni-freiburg.de/psdp/

1 hours

turns translated words mean.duration talkprop people hours turns_per_h
1604 0.98 5964 1931 0.87 7 1.01 1588

annotation types

nature n
talk 1582
NA 22

samples

3 sources

source turns translated words people notiming talkprop minutes hours
/pite_saami1/pit080924 692 1.00 2437 2 0 1.0 23.4 0.39
/pite_saami1/pit090519 393 0.96 1275 4 0 0.7 15.0 0.25
/pite_saami1/pit090702 519 0.98 2252 3 0 0.9 22.4 0.37

Polish

Short name: polish; glottolog name: Polish; glottocode: poli1260; family/type: Indo-European; macroarea: Eurasia

URL: http://pelcra.pl/new/spoken_corpora_50

15.8 hours

turns translated words mean.duration talkprop people hours turns_per_h
23851 0 123777 2132 0.9 87 15.78 1511

annotation types

nature n
[cough] 22
[groan] 2
[sigh] 21
[sniff] 30
[yawn] 2
breath 1645
laugh 340
talk 21208
NA 581

samples

28 sources

Showing only the first 10 sources; use allsources=T to show all

source turns translated words people notiming talkprop minutes hours
/polish1/DS_001 1506 0 3983 3 0 0.9 31.8 0.53
/polish1/DS_002 2028 0 8047 2 0 0.9 59.2 0.99
/polish1/DS_005 1209 0 8617 3 0 1.1 58.5 0.97
/polish1/DS_007 914 0 4743 2 0 0.9 32.9 0.55
/polish1/DS_008 680 0 3988 2 0 1.0 28.3 0.47
/polish1/DS_009 918 0 4265 2 0 0.9 27.9 0.47
/polish1/DS_010 1016 0 4748 3 0 0.9 32.0 0.53
/polish1/DS_011 500 0 3552 3 0 1.0 25.8 0.43
/polish1/DS_012 818 0 2140 3 0 0.9 20.4 0.34
/polish1/DS_013 689 0 4452 3 0 1.0 32.3 0.54

Sakun

Short name: sakun; glottolog name: Sukur; glottocode: suku1272; family/type: Afro-Asiatic; macroarea: Africa

URL: https://www.elararchive.org/dk0252

2 hours

turns translated words mean.duration talkprop people hours turns_per_h
1292 1 8519 2024 0.56 12 2 646

annotation types

nature n
talk 1292

samples

11 sources

Showing only the first 10 sources; use allsources=T to show all

source turns translated words people notiming talkprop minutes hours
/sakun1/baba1 87 1.00 818 2 0 0.8 5.9 0.10
/sakun1/bull2 44 0.98 416 6 0 0.6 3.0 0.05
/sakun1/bull3 28 1.00 296 3 0 0.9 1.8 0.03
/sakun1/bull5 172 0.99 727 12 0 0.8 4.7 0.08
/sakun1/cattlepen2 129 0.99 996 4 0 0.2 22.2 0.37
/sakun1/newhouse2 43 1.00 464 2 0 0.9 2.5 0.04
/sakun1/pottery1 105 1.00 662 5 0 0.5 7.3 0.12
/sakun1/pottery2 313 1.00 1803 5 0 0.5 22.3 0.37
/sakun1/ran1 69 1.00 514 5 0 0.7 3.3 0.05
/sakun1/thatching1 73 1.00 328 4 0 0.1 14.1 0.23

Sambas

Short name: sambas; glottolog name: Kendayan-Belangin; glottocode: kend1254; family/type: Austronesian; macroarea: Papunesia

URL: https://archive.mpi.nl/tla/islandora/object/tla%3A1839_00_0000_0000_0022_5D7C_E

6.1 hours

turns translated words mean.duration talkprop people hours turns_per_h
51726 0 225681 338 0.17 45 6.13 8438

annotation types

nature n
talk 50617
NA 1109

samples

24 sources

Showing only the first 10 sources; use allsources=T to show all

source turns translated words people notiming talkprop minutes hours
/sambas1/SBS-20100203 2836 0 13286 3 2836 0 -Inf 0
/sambas1/SBS-20100222a 1824 0 9059 3 1824 0 -Inf 0
/sambas1/SBS-20100222b 2480 0 11258 4 2480 0 -Inf 0
/sambas1/SBS-20100301 595 0 2826 3 595 0 -Inf 0
/sambas1/SBS-20100303 3209 0 15957 4 3209 0 -Inf 0
/sambas1/SBS-20100305 2123 0 11376 5 2123 0 -Inf 0
/sambas1/SBS-20100609 828 0 3213 2 828 0 -Inf 0
/sambas1/SBS-20100617 2928 0 12182 4 2928 0 -Inf 0
/sambas1/SBS-20100709 3130 0 13427 4 3130 0 -Inf 0
/sambas1/SBS-20100710 2653 0 10805 5 2653 0 -Inf 0

Siona

Short name: siona; glottolog name: Siona-Tetete; glottocode: sion1247; family/type: Tucanoan; macroarea: South America

URL: http://hdl.handle.net/2196/00-0000-0000-000D-EA53-3

2.9 hours

turns translated words mean.duration talkprop people hours turns_per_h
2420 0.39 10056 3344 0.78 7 2.9 834

annotation types

nature n
talk 2418
NA 2

samples

14 sources

Showing only the first 10 sources; use allsources=T to show all

source turns translated words people notiming talkprop minutes hours
/siona1/20101119oispa001 151 0.97 813 2 0 1.2 6.2 0.10
/siona1/20140723salsu002 27 0.26 175 2 0 0.9 2.2 0.04
/siona1/20140804salsu001 133 0.27 421 2 0 0.3 11.8 0.20
/siona1/20140804salsu003 24 0.33 62 2 0 0.7 2.0 0.03
/siona1/20140805salsu003 218 0.41 873 2 0 0.5 26.6 0.44
/siona1/20140805salsu005 287 0.78 1349 2 0 0.9 22.1 0.37
/siona1/20140805salsu010 232 0.46 899 2 0 0.7 22.0 0.37
/siona1/20140805salsu012 267 0.33 958 2 0 0.6 28.9 0.48
/siona1/20140805salsu013 10 0.20 28 2 0 0.3 2.4 0.04
/siona1/20140925salsu001 115 0.36 629 2 0 0.9 9.7 0.16

Siputhi

Short name: siputhi; glottolog name: Swati; glottocode: swat1243; family/type: Atlantic-Congo; macroarea: Africa

URL: http://hdl.handle.net/2196/ebca9f1e-c73c-4d22-8ed8-3abcb2d51ffa

0.3 hours

turns translated words mean.duration talkprop people hours turns_per_h
430 1 1753 2419 1 5 0.29 1483

annotation types

nature n
talk 430

samples

8 sources

source turns translated words people notiming talkprop minutes hours
/siputhi1/20190205_1520_MAT_20200720 20 1 115 2 0 1.0 1.2 0.02
/siputhi1/20190205_1609_MAT_20200928 23 1 208 2 0 1.0 2.4 0.04
/siputhi1/20190207_1454_RAM_20200720 27 1 102 5 0 0.9 1.3 0.02
/siputhi1/20190211_1634_MAK_20200720 49 1 270 2 0 1.0 2.9 0.05
/siputhi1/20190211_1645_MAK_20200329 26 1 169 2 0 0.9 1.6 0.03
/siputhi1/20190213_1307_QOI_20200928 192 1 607 2 0 1.0 5.6 0.09
/siputhi1/20190213_1309_QOI_20200706 58 1 150 2 0 1.1 1.3 0.02
/siputhi1/20190226_1602_MPA_20200720 35 1 132 3 0 1.1 1.2 0.02

Siwu

Short name: siwu; glottolog name: Siwu; glottocode: siwu1238; family/type: Atlantic-Congo; macroarea: Africa

URL: https://hdl.handle.net/1839/c410de17-81eb-4477-ae0d-d43ff1aea085

9.9 hours

turns translated words mean.duration talkprop people hours turns_per_h
18341 0.99 105903 1487 0.77 18 9.94 1845

annotation types

nature n
[nod] 8
laugh 292
talk 17798
NA 243

samples

7 sources

source turns translated words people notiming talkprop minutes hours
/siwu1/Compound 2123 0.96 11667 8 0 1.0 59.8 1.00
/siwu1/Compound_4 1367 1.00 7294 6 0 0.7 64.5 1.08
/siwu1/Maize_1 3865 0.98 23330 8 0 0.6 127.7 2.13
/siwu1/Maize_3 1859 0.97 10910 8 0 0.7 58.0 0.97
/siwu1/Neighbours 4071 1.00 24347 8 0 1.0 106.4 1.77
/siwu1/Two_men_2 1928 0.99 9175 3 0 0.6 59.8 1.00
/siwu1/Two_men_3 3128 1.00 19180 7 0 0.8 119.6 1.99

Southern Pinghua

Short name: southern_pinghua; glottolog name: Southern Pinghua; glottocode: sout3250; family/type: Sino-Tibetan; macroarea: Eurasia

URL: NA

0.9 hours

turns translated words mean.duration talkprop people hours turns_per_h
510 0 9961 6204 1 1 0.88 580

annotation types

nature n
talk 510

samples

1 sources

source turns translated words people notiming talkprop minutes hours
/southern_pinghua1/WCPH007_transcription_20200605 510 0 9961 1 0 1 53 0.88

Southern Qiang

Short name: southern_qiang; glottolog name: Southern Qiang; glottocode: sout2728; family/type: Sino-Tibetan; macroarea: Eurasia

URL: http://hdl.handle.net/2196/00-0000-0000-0012-5FAD-9

1.2 hours

turns translated words mean.duration talkprop people hours turns_per_h
1523 0 4972 1300 0.5 3 1.16 1313

annotation types

nature n
talk 1520
NA 3

samples

2 sources

source turns translated words people notiming talkprop minutes hours
/southern_qiang1/YH-060 235 0 889 2 2 0.2 41.0 0.68
/southern_qiang1/YH-837 1288 0 4083 3 0 0.8 28.6 0.48

Spanish

Short name: spanish; glottolog name: Spanish; glottocode: stan1288; family/type: Indo-European; macroarea: Eurasia

URL: https://catalog.ldc.upenn.edu/LDC96S35

27.6 hours

turns translated words mean.duration talkprop people hours turns_per_h
40202 0 304734 2558 1.04 32 27.63 1455

annotation types

nature n
[clearsthroat] 9
[cough] 6
[sneeze] 3
[sniff] 14
breath 52
laugh 588
talk 38971
NA 559

samples

182 sources

Showing only the first 10 sources; use allsources=T to show all

source turns translated words people notiming talkprop minutes hours
/spanish2/0053 158 0 980 3 38 1.1 5.0 0.08
/spanish2/0082 122 0 750 2 27 1.0 5.0 0.08
/spanish2/0084 110 0 949 2 29 1.0 5.0 0.08
/spanish2/0085 245 0 2254 2 37 1.0 10.2 0.17
/spanish2/0088 139 0 913 2 31 1.0 5.0 0.08
/spanish2/0096 270 0 1914 2 44 1.0 10.2 0.17
/spanish2/0098 243 0 1770 3 73 1.1 10.0 0.17
/spanish2/0100 317 0 1603 2 99 1.0 10.5 0.17
/spanish2/0291 384 0 2096 2 94 1.0 11.0 0.18
/spanish2/0616 302 0 1996 2 12 1.1 10.1 0.17

Tehuelche

Short name: tehuelche; glottolog name: Tehuelche; glottocode: tehu1242; family/type: Chonan; macroarea: South America

URL: http://hdl.handle.net/2196/00-0000-0000-0011-F549-B

1.5 hours

turns translated words mean.duration talkprop people hours turns_per_h
1562 0 6346 1314 0.4 4 1.5 1041

annotation types

nature n
talk 1561
NA 1

samples

2 sources

source turns translated words people notiming talkprop minutes hours
/tehuelche1/tehuelche16 211 0 757 3 0 0.1 46.2 0.77
/tehuelche1/tehuelche21 1351 0 5589 4 0 0.7 43.8 0.73

Tena Kichwa

Short name: tena_kichwa; glottolog name: Tena Lowland Quichua; glottocode: tena1240; family/type: Quechuan; macroarea: South America

URL: https://www.elararchive.org/dk0312/

1.4 hours

turns translated words mean.duration talkprop people hours turns_per_h
1939 1 8119 2258 0.89 11 1.36 1426

annotation types

nature n
talk 1939

samples

8 sources

source turns translated words people notiming talkprop minutes hours
/tena_kichwa1/ev_24052013_01 59 1 208 4 0 0.7 2.6 0.04
/tena_kichwa1/in_01082013_02 183 1 1013 2 0 1.1 8.6 0.14
/tena_kichwa1/in_01082013_16 52 1 284 2 0 1.0 3.0 0.05
/tena_kichwa1/in_01082013_18 383 1 1501 2 0 0.8 15.1 0.25
/tena_kichwa1/in_01082013_19 323 1 1187 3 0 0.8 12.1 0.20
/tena_kichwa1/in_01082013_20 193 1 874 3 0 0.9 6.9 0.11
/tena_kichwa1/in_01082013_21 386 1 1629 3 0 0.9 15.8 0.26
/tena_kichwa1/in_02072013 360 1 1423 3 0 0.9 18.9 0.31

Totoli

Short name: totoli; glottolog name: Totoli; glottocode: toto1304; family/type: Austronesian; macroarea: Papunesia

URL: https://hdl.handle.net/1839/00-0000-0000-0014-C590-D

1.1 hours

turns translated words mean.duration talkprop people hours turns_per_h
4457 0.7 8625 806 0.89 35 1.11 4015

annotation types

nature n
[cough] 24
talk 3858
NA 575

samples

8 sources

source turns translated words people notiming talkprop minutes hours
/totoli1/chat 169 0.67 324 8 0 1.1 2.1 0.03
/totoli1/Conv_Han_Salma 215 0.45 440 2 0 0.7 4.4 0.07
/totoli1/conversation 654 0.55 1335 7 0 1.1 8.2 0.14
/totoli1/conversation_2 1117 0.68 1978 6 0 0.9 17.2 0.29
/totoli1/conversation_3 98 0.69 206 4 0 0.7 1.9 0.03
/totoli1/language_situation 734 0.82 1417 6 0 1.0 9.5 0.16
/totoli1/silsilah_TTL_2 666 0.87 1373 4 0 0.8 9.8 0.16
/totoli1/village_names_4 804 0.88 1552 3 0 0.8 13.5 0.23

Tseltal

Short name: tseltal; glottolog name: Tzeltal; glottocode: tzel1254; family/type: Mayan; macroarea: North America

URL: https://islandora-ailla.lib.utexas.edu/islandora/object/ailla%3A124445

1.7 hours

turns translated words mean.duration talkprop people hours turns_per_h
2666 1 12796 1639 0.67 33 1.72 1550

annotation types

nature n
talk 2666

samples

3 sources

source turns translated words people notiming talkprop minutes hours
/tseltal1/070627_Cancuc_panaderia_exito 1146 1.00 5732 5 0 0.8 43.9 0.73
/tseltal1/070728_Cancuc_paseo_a_chak_te 1078 0.99 5075 6 0 0.7 39.3 0.65
/tseltal1/080201_3_Tenejapa_Tajimal_Kin_Spayel__Mayil 442 1.00 1989 24 0 0.5 20.5 0.34

Ulwa

Short name: ulwa; glottolog name: Ulwa; glottocode: ulwa1239; family/type: Misumalpan; macroarea: North America

URL: http://hdl.handle.net/2196/00-0000-0000-000F-CB61-A

2.6 hours

turns translated words mean.duration talkprop people hours turns_per_h
3216 0.8 20239 2934 0.97 5 2.64 1218

annotation types

nature n
[cough] 52
[sigh] 3
[sniff] 6
[yawn] 3
laugh 13
talk 3137
NA 2

samples

6 sources

source turns translated words people notiming talkprop minutes hours
/ulwa1/ulwa014 1683 0.88 9151 2 0 1.0 74.4 1.24
/ulwa1/ulwa037 1167 0.00 8574 2 0 0.9 65.8 1.10
/ulwa1/ulwa038 109 0.99 779 2 0 1.0 5.0 0.08
/ulwa1/ulwa040 54 1.00 395 2 0 1.0 2.8 0.05
/ulwa1/ulwa041 66 0.95 439 2 0 0.9 3.3 0.06
/ulwa1/ulwa042 137 0.99 901 2 0 1.0 6.4 0.11

Vamale

Short name: vamale; glottolog name: Vamale; glottocode: vama1243; family/type: Austronesian; macroarea: Papunesia

URL: http://hdl.handle.net/2196/044967e0-e54e-4f00-a979-fb751b2e66cf

1.3 hours

turns translated words mean.duration talkprop people hours turns_per_h
1507 0 9538 2211 0.72 13 1.32 1142

annotation types

nature n
laugh 2
talk 1502
NA 3

samples

4 sources

source turns translated words people notiming talkprop minutes hours
/vamale1/vamale-170723_la-peche_STE 89 0 921 2 0 0.8 5.4 0.09
/vamale1/vamale-170731-cycle_de_vie 548 0 3051 5 0 0.6 31.7 0.53
/vamale1/vamale-170731-demander_main-MS 335 0 2017 3 0 0.8 16.2 0.27
/vamale1/vamale-190830-kito-4 535 0 3549 4 0 0.7 26.1 0.43

Wooi

Short name: wooi; glottolog name: Woi; glottocode: woii1237; family/type: Austronesian; macroarea: Papunesia

URL: https://hdl.handle.net/1839/eb0ab65a-e985-42d1-a9ee-fccdba47a526

0.9 hours

turns translated words mean.duration talkprop people hours turns_per_h
2124 0.6 5415 1116 0.71 64 0.94 2260

annotation types

nature n
[cough] 2
laugh 64
talk 1603
NA 455

samples

14 sources

Showing only the first 10 sources; use allsources=T to show all

source turns translated words people notiming talkprop minutes hours
/wooi1/boatpreparation 102 0.87 314 4 0 0.6 3.1 0.05
/wooi1/BOBO_production-consumption 192 0.68 541 8 0 0.8 5.0 0.08
/wooi1/joking_conversation 80 0.42 126 5 0 1.1 1.2 0.02
/wooi1/KAPUR_production 46 0.52 97 3 0 0.5 1.6 0.03
/wooi1/KEPALADESA_dialog1 135 0.81 446 4 0 0.5 4.7 0.08
/wooi1/kids_cleaningwell 107 0.53 182 4 0 0.9 2.5 0.04
/wooi1/kitchenconversation 322 0.63 800 7 0 0.6 9.3 0.16
/wooi1/Miosnum_dialog_female 84 0.75 287 5 0 0.7 3.6 0.06
/wooi1/Multilog_between_men 56 0.57 197 12 0 0.8 1.4 0.02
/wooi1/PAPEDA_eating1 299 0.70 866 7 0 0.7 6.7 0.11

Yakkha

Short name: yakkha; glottolog name: Yakkha; glottocode: yakk1236; family/type: Sino-Tibetan; macroarea: Eurasia

URL: http://hdl.handle.net/2196/d76bd932-9390-4c02-b7c9-1e8aa76b7234

0.9 hours

turns translated words mean.duration talkprop people hours turns_per_h
1373 1 31830 2590 1.06 8 0.92 1492

annotation types

nature n
laugh 3
talk 1370

samples

5 sources

source turns translated words people notiming talkprop minutes hours
/yakkha1/06_cvs_01 108 1 3050 2 0 0.9 4.1 0.07
/yakkha1/13_cvs_02 123 1 4010 3 0 1.0 6.3 0.10
/yakkha1/28_cvs_04 355 1 7719 5 0 1.1 14.1 0.24
/yakkha1/29_cvs_05 177 1 3596 5 0 1.2 6.5 0.11
/yakkha1/36_cvs_06 610 1 13455 3 0 1.1 24.2 0.40

Yali

Short name: yali; glottolog name: Pass Valley Yali; glottocode: pass1247; family/type: Nuclear Trans New Guinea; macroarea: Papunesia

URL: https://hdl.handle.net/1839/00-0000-0000-0017-EA2D-D

0.6 hours

turns translated words mean.duration talkprop people hours turns_per_h
1311 0.41 5394 1408 0.8 14 0.63 2081

annotation types

nature n
laugh 97
talk 1168
NA 46

samples

2 sources

source turns translated words people notiming talkprop minutes hours
/yali1/conversation_1 538 0.78 1641 6 0 0.7 12.8 0.21
/yali1/conversation_2 773 0.03 3753 8 0 0.9 25.4 0.42

Yélî Dnye

Short name: yeli_dnye; glottolog name: Yele; glottocode: yele1255; family/type: Yele; macroarea: Papunesia

URL: https://hdl.handle.net/1839/00-0000-0000-0000-C274-3

1.2 hours

turns translated words mean.duration talkprop people hours turns_per_h
1896 0 8708 1188 0.5 22 1.18 1607

annotation types

nature n
[cough] 8
[nod] 25
laugh 20
talk 1666
NA 177

samples

3 sources

source turns translated words people notiming talkprop minutes hours
/yeli_dnye1/r03_v19_s2 652 0 2911 6 0 0.4 29.7 0.49
/yeli_dnye1/r03_v20_s5 692 0 2898 11 0 0.5 24.2 0.40
/yeli_dnye1/r03_v21_s1 552 0 2899 6 0 0.6 17.2 0.29

Zaar

Short name: zaar; glottolog name: Saya; glottocode: saya1246; family/type: Afro-Asiatic; macroarea: Africa

URL: http://www.language-archives.org/language/say

0.5 hours

turns translated words mean.duration talkprop people hours turns_per_h
1754 0.95 5608 818 0.83 2 0.49 3580

annotation types

nature n
laugh 25
talk 1701
NA 28

samples

3 sources

source turns translated words people notiming talkprop minutes hours
/zaar1/SAY_BC_CONV_01 343 0.95 1107 2 0 0.9 5.4 0.09
/zaar1/SAY_BC_CONV_02 429 0.96 1367 2 0 0.8 8.9 0.15
/zaar1/SAY_BC_CONV_03 982 0.95 3134 2 0 0.8 15.0 0.25

Zacatepec_chatino

Short name: zacatepec_chatino; glottolog name: Zacatepec Chatino; glottocode: zaca1242; family/type: Otomanguean; macroarea: North America

URL: NA

1.8 hours

turns translated words mean.duration talkprop people hours turns_per_h
2154 0.45 24103 2726 0.82 6 1.8 1197

annotation types

nature n
talk 2154

samples

5 sources

source turns translated words people notiming talkprop minutes hours
/zacetepec_chatino1/zac-2011_06_03-trans_mgh_mcg-sv 171 0.86 1066 2 0 0.9 8.2 0.14
/zacetepec_chatino1/ZAC-2011_06_08-Trans_MGH_SC_FH-sv 226 0.70 1311 2 0 0.6 15.4 0.26
/zacetepec_chatino1/ZAC-2011_06_17-Trans_MGH_AMH_ED-sv 168 0.21 1139 3 0 0.8 8.1 0.13
/zacetepec_chatino1/ZAC-2011_06_22-Trans_MGH_AMH_IHG-sv 274 0.46 1637 2 0 0.8 15.6 0.26
/zacetepec_chatino1/zac-2012_07_11-trans_mgh_mbh_amp 1315 0.00 18950 2 0 1.0 60.5 1.01

Zauzou

Short name: zauzou; glottolog name: Zauzou; glottocode: zauz1238; family/type: Sino-Tibetan; macroarea: Eurasia

URL: http://hdl.handle.net/2196/bc64e9fe-4ce0-4af7-b79d-39d73e6ff66f

1.4 hours

turns translated words mean.duration talkprop people hours turns_per_h
1833 1 15289 2488 0.88 5 1.42 1291

annotation types

nature n
talk 1833

samples

10 sources

source turns translated words people notiming talkprop minutes hours
/zauzou1/170806OuYuHua-WayToMarket1 69 1.00 747 3 0 0.9 3.8 0.06
/zauzou1/170806OuYuHua-WayToMarket2 45 1.00 466 2 0 0.7 3.2 0.05
/zauzou1/170908EverydayConversation-LiLiMeiHouse 349 0.99 2717 3 0 0.8 16.4 0.27
/zauzou1/170913YangLiZhong-RepairPipe 382 0.99 3133 3 0 1.0 13.4 0.22
/zauzou1/170928EverydayConversation-LiYuJiongLiShunXiang 212 1.00 1629 2 0 0.9 9.5 0.16
/zauzou1/180802ConversationInSnackShop 60 1.00 530 2 0 0.7 3.5 0.06
/zauzou1/180816ConversationBetweenRituals2-YangShuShanLiLiMei 53 1.00 464 2 0 1.0 2.3 0.04
/zauzou1/180816ConversationBetweenRituals3-LiLiMeiYangShuShan 78 1.00 797 2 0 1.0 4.3 0.07
/zauzou1/180918Conversation-LiYuJiongYangShuShan-GuestVisit01 326 1.00 2695 3 0 0.9 17.4 0.29
/zauzou1/190521Conversation-ChildBirth03 259 1.00 2111 5 0 0.9 11.8 0.20

Examples of reasons for exclusion

While every single corpus considered here represents an immensely valuable record of communicative behaviour and linguistic resources used in interaction, differences in annotation standards make not all corpora as useful for all kinds of purposes.

For instance, a corpus might consist of a large amount of transcribed segments that can be useful for purposes relating to automatic speech recognition; but it may be mostly monologic, which makes it harder to use for the analysis of interactional infrastructure. Or a corpus make provide sufficient data to be used for some corpus linguistic analyses of broad grammatical structures, but its annotations may only be roughly aligned with the actual speech signal, making it hard to use for speech recognition or conversation analytical purposes.

In this section we discuss a number of examples of corpora along with possible reasons for excluding them from some kinds of analyses.

Duoxu

Duoxu is a small corpus (a little over 300 annotations) that is mostly monologic. While each of the sessions contains at least 2 participants (qualifying for inclusion), the actual interactions show little dyadic interaction. That only ~70 out of ~350 annotations count as transitions between participants means that most conversations consist of turns produced in succession by one participant without interactive contributions by the other.

This means that the Duoxu corpus may be useful for phonetic or morphosyntactic research, but that it doesn’t provide sufficient stretches of casual conversation to inform analyses of interactional infrastructure.

inspect_corpus("duoxu")

0.4 hours

turns translated words mean.duration talkprop people hours turns_per_h
327 0.5 3128 3530 0.8 4 0.4 818

annotation types

nature n
talk 327

samples

2 sources

source turns translated words people notiming talkprop minutes hours
/duoxu1/duoxu800 157 1 1460 2 0 0.7 11.8 0.2
/duoxu1/duoxu801 170 0 1668 2 0 0.9 11.8 0.2

Hungarian

Hungarian is an enormous and well-transcribed corpus, but stands out among other large corpora in having a very large amount of transitions timed at exactly 0. Over 27% of all speaker transitions are timed like this, which makes it an outlier relative to other corpora.

d %>% 
  filter(language %in% c("dutch","hungarian"),
         participants == 2) %>%
  drop_na(FTO) %>%
  ggplot(aes(FTO)) +
  theme_tufte() +
  ggtitle("Comparing timing distributions in Dutch and Hungarian corpora") +
  geom_vline(xintercept=0,colour="#cccccc") +
  geom_density(trim=T) +
  xlim(c(-2000,2000)) +
  facet_wrap(~ language)

Nahuatl

The Nahuatl corpus originated as recordings of ethnobotanical elicitation sessions and is a formidable resource made available through OpenSLR. Both the mode of interaction and the way it has been segmented make it hard to use, without considerable additional work, for sequential or interactional analyses of joint action, timing, and turn-taking.

Many of the Nahuatl recordings are monologue (as in the two lower examples) or highly skewed dialogue with one speaker supplying ethnobotanical identifications and another speaker providing relatively minimal responses. When there is more interaction, as in the first two examples, its segmentation bears limited relation to the speech signal. Annotations are either fully overlapping or exactly non-overlapping. Partial overlaps are are.

nahuatl_uids <- c("nahuatl-041-082-141587",
                  "nahuatl-066-344-732468",
                  "nahuatl-244-109-412319",
                  "nahuatl-273-239-1014736")

convplot(nahuatl_uids,content=T,window=15000,dyads=T)

c(“nahuatl-041-082-141587”, “nahuatl-066-344-732468”, “nahuatl-244-109-412319”, “nahuatl-273-239-1014736”) [1] “seeing 4 dyads in 4 non-overlapping extracts”

Akie and Mambila

Akie and Mambila are further examples of corpora in which the timing of annotations does not conform to the actual speech signal. The main observation here is that all annotations are mutually exclusive: there is never any overlap. Considering the normal distribution of turn-taking and timing in interaction, this cannot represent the actual temporal distribution of turns in the interaction, and indeed inspection of the audio recordings for these corpora shows that it does not. This means, in effect, that what is transcribed in an annotation roughly conforms to a turn a talk, but that the details of the timing of this turn, such as its duration and its precisely placement in relation to other’s turns, cannot be treated as accurate.

While these corpora do lend themselves to several forms of linguistic analysis, their method of segmentation means that it would take considerable additional work to use this data in analyses of timing and turn-taking as well as for qualitative and quantitative analysis of talk-in-interaction.

example_uids <- c("akie-1-084-198851",
                  "akie-1-154-328594",
                  "mambila-1-0156-288901",
                  "mambila-1-0959-1813440")

convplot(example_uids,content=T,window=15000,dyads=T)

c(“akie-1-084-198851”, “akie-1-154-328594”, “mambila-1-0156-288901”, “mambila-1-0959-1813440”) [1] “seeing 4 dyads in 4 non-overlapping extracts”

Overall overviews

The following figures give an impression of hours, turns, annotation density and annotation length for the whole set of languages. Hours and turns are log scaled in this overview because the largest corpora dwarf many smaller ones.